Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 666
Filtrar
1.
Genome Biol ; 25(1): 100, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38641812

RESUMO

Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.


Assuntos
Metadados , Projetos de Pesquisa , Reprodutibilidade dos Testes
2.
PLoS One ; 19(4): e0295474, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38568922

RESUMO

Insect monitoring is essential to design effective conservation strategies, which are indispensable to mitigate worldwide declines and biodiversity loss. For this purpose, traditional monitoring methods are widely established and can provide data with a high taxonomic resolution. However, processing of captured insect samples is often time-consuming and expensive, which limits the number of potential replicates. Automated monitoring methods can facilitate data collection at a higher spatiotemporal resolution with a comparatively lower effort and cost. Here, we present the Insect Detect DIY (do-it-yourself) camera trap for non-invasive automated monitoring of flower-visiting insects, which is based on low-cost off-the-shelf hardware components combined with open-source software. Custom trained deep learning models detect and track insects landing on an artificial flower platform in real time on-device and subsequently classify the cropped detections on a local computer. Field deployment of the solar-powered camera trap confirmed its resistance to high temperatures and humidity, which enables autonomous deployment during a whole season. On-device detection and tracking can estimate insect activity/abundance after metadata post-processing. Our insect classification model achieved a high top-1 accuracy on the test dataset and generalized well on a real-world dataset with captured insect images. The camera trap design and open-source software are highly customizable and can be adapted to different use cases. With custom trained detection and classification models, as well as accessible software programming, many possible applications surpassing our proposed deployment method can be realized.


Assuntos
Insetos , Software , Animais , Biodiversidade , Coleta de Dados , Metadados
3.
Database (Oxford) ; 20242024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38581360

RESUMO

When the scientific dataset evolves or is reused in workflows creating derived datasets, the integrity of the dataset with its metadata information, including provenance, needs to be securely preserved while providing assurances that they are not accidentally or maliciously altered during the process. Providing a secure method to efficiently share and verify the data as well as metadata is essential for the reuse of the scientific data. The National Science Foundation (NSF) funded Open Science Chain (OSC) utilizes consortium blockchain to provide a cyberinfrastructure solution to maintain integrity of the provenance metadata for published datasets and provides a way to perform independent verification of the dataset while promoting reuse and reproducibility. The NSF- and National Institutes of Health (NIH)-funded Neuroscience Gateway (NSG) provides a freely available web portal that allows neuroscience researchers to execute computational data analysis pipeline on high performance computing resources. Combined, the OSC and NSG platforms form an efficient, integrated framework to automatically and securely preserve and verify the integrity of the artifacts used in research workflows while using the NSG platform. This paper presents the results of the first study that integrates OSC-NSG frameworks to track the provenance of neurophysiological signal data analysis to study brain network dynamics using the Neuro-Integrative Connectivity tool, which is deployed in the NSG platform. Database URL: https://www.opensciencechain.org.


Assuntos
Neurociências , Publicações , Reprodutibilidade dos Testes , Bases de Dados Factuais , Metadados
4.
PLoS One ; 19(3): e0296810, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38483886

RESUMO

Contact matrices are a commonly adopted data representation, used to develop compartmental models for epidemic spreading, accounting for the contact heterogeneities across age groups. Their estimation, however, is generally time and effort consuming and model-driven strategies to quantify the contacts are often needed. In this article we focus on household contact matrices, describing the contacts among the members of a family and develop a parametric model to describe them. This model combines demographic and easily quantifiable survey-based data and is tested on high resolution proximity data collected in two sites in South Africa. Given its simplicity and interpretability, we expect our method to be easily applied to other contexts as well and we identify relevant questions that need to be addressed during the data collection procedure.


Assuntos
Epidemias , Metadados , Inquéritos e Questionários , Modelos Epidemiológicos , África do Sul , Busca de Comunicante/métodos
5.
PLoS One ; 19(3): e0297404, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38446758

RESUMO

Film festivals are a key component in the global film industry in terms of trendsetting, publicity, trade, and collaboration. We present an unprecedented analysis of the international film festival circuit, which has so far remained relatively understudied quantitatively, partly due to the limited availability of suitable data sets. We use large-scale data from the Cinando platform of the Cannes Film Market, widely used by industry professionals. We explicitly model festival events as a global network connected by shared films and quantify festivals as aggregates of the metadata of their showcased films. Importantly, we argue against using simple count distributions for discrete labels such as language or production country, as such categories are typically not equidistant. Rather, we propose embedding them in continuous latent vector spaces. We demonstrate how these "festival embeddings" provide insight into changes in programmed content over time, predict festival connections, and can be used to measure diversity in film festival programming across various cultural, social, and geographical variables-which all constitute an aspect of public value creation by film festivals. Our results provide a novel mapping of the film festival circuit between 2009-2021 (616 festivals, 31,989 unique films), highlighting festival types that occupy specific niches, diverse series, and those that evolve over time. We also discuss how these quantitative findings fit into media studies and research on public value creation by cultural industries. With festivals occupying a central position in the film industry, investigations into the data they generate hold opportunities for researchers to better understand industry dynamics and cultural impact, and for organizers, policymakers, and industry actors to make more informed, data-driven decisions. We hope our proposed methodological approach to festival data paves way for more comprehensive film festival studies and large-scale quantitative cultural event analytics in general.


Assuntos
Férias e Feriados , Indústrias , Geografia , Idioma , Metadados
6.
Lab Anim (NY) ; 53(3): 67-79, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38438748

RESUMO

Although biomedical research is experiencing a data explosion, the accumulation of vast quantities of data alone does not guarantee a primary objective for science: building upon existing knowledge. Data collected that lack appropriate metadata cannot be fully interrogated or integrated into new research projects, leading to wasted resources and missed opportunities for data repurposing. This issue is particularly acute for research using animals, where concerns regarding data reproducibility and ensuring animal welfare are paramount. Here, to address this problem, we propose a minimal metadata set (MNMS) designed to enable the repurposing of in vivo data. MNMS aligns with an existing validated guideline for reporting in vivo data (ARRIVE 2.0) and contributes to making in vivo data FAIR-compliant. Scenarios where MNMS should be implemented in diverse research environments are presented, highlighting opportunities and challenges for data repurposing at different scales. We conclude with a 'call for action' to key stakeholders in biomedical research to adopt and apply MNMS to accelerate both the advancement of knowledge and the betterment of animal welfare.


Assuntos
Pesquisa Biomédica , Metadados , Animais , Reprodutibilidade dos Testes , Bem-Estar do Animal
7.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38445753

RESUMO

SUMMARY: Python is the most commonly used language for deep learning (DL). Existing Python packages for mass spectrometry imaging (MSI) data are not optimized for DL tasks. We, therefore, introduce pyM2aia, a Python package for MSI data analysis with a focus on memory-efficient handling, processing and convenient data-access for DL applications. pyM2aia provides interfaces to its parent application M2aia, which offers interactive capabilities for exploring and annotating MSI data in imzML format. pyM2aia utilizes the image input and output routines, data formats, and processing functions of M2aia, ensures data interchangeability, and enables the writing of readable and easy-to-maintain DL pipelines by providing batch generators for typical MSI data access strategies. We showcase the package in several examples, including imzML metadata parsing, signal processing, ion-image generation, and, in particular, DL model training and inference for spectrum-wise approaches, ion-image-based approaches, and approaches that use spectral and spatial information simultaneously. AVAILABILITY AND IMPLEMENTATION: Python package, code and examples are available at (https://m2aia.github.io/m2aia).


Assuntos
Aprendizado Profundo , Software , Espectrometria de Massas/métodos , Idioma , Metadados
8.
J Am Med Inform Assoc ; 31(4): 910-918, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38308819

RESUMO

OBJECTIVES: Despite federally mandated collection of sex and gender demographics in the electronic health record (EHR), longitudinal assessments are lacking. We assessed sex and gender demographic field utilization using EHR metadata. MATERIALS AND METHODS: Patients ≥18 years of age in the Mass General Brigham health system with a first Legal Sex entry (registration requirement) between January 8, 2018 and January 1, 2022 were included in this retrospective study. Metadata for all sex and gender fields (Legal Sex, Sex Assigned at Birth [SAAB], Gender Identity) were quantified by completion rates, user types, and longitudinal change. A nested qualitative study of providers from specialties with high and low field use identified themes related to utilization. RESULTS: 1 576 120 patients met inclusion criteria: 100% had a Legal Sex, 20% a Gender Identity, and 19% a SAAB; 321 185 patients had field changes other than initial Legal Sex entry. About 2% of patients had a subsequent Legal Sex change, and 25% of those had ≥2 changes; 20% of patients had ≥1 update to Gender Identity and 19% to SAAB. Excluding the first Legal Sex entry, administrators made most changes (67%) across all fields, followed by patients (25%), providers (7.2%), and automated Health Level-7 (HL7) interface messages (0.7%). Provider utilization varied by subspecialty; themes related to systems barriers and personal perceptions were identified. DISCUSSION: Sex and gender demographic fields are primarily used by administrators and raise concern about data accuracy; provider use is heterogenous and lacking. Provider awareness of field availability and variable workflows may impede use. CONCLUSION: EHR metadata highlights areas for improvement of sex and gender field utilization.


Assuntos
Identidade de Gênero , Pessoas Transgênero , Recém-Nascido , Humanos , Masculino , Feminino , Registros Eletrônicos de Saúde , Metadados , Estudos Retrospectivos , Demografia
9.
Mol Cell Proteomics ; 23(3): 100731, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38331191

RESUMO

Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.


Assuntos
Privacidade , Proteômica , Humanos , Genômica , Metadados , Disseminação de Informação
10.
Sci Data ; 11(1): 179, 2024 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-38332144

RESUMO

Data standardization promotes a common framework through which researchers can utilize others' data and is one of the leading methods neuroimaging researchers use to share and replicate findings. As of today, standardizing datasets requires technical expertise such as coding and knowledge of file formats. We present ezBIDS, a tool for converting neuroimaging data and associated metadata to the Brain Imaging Data Structure (BIDS) standard. ezBIDS contains four major features: (1) No installation or programming requirements. (2) Handling of both imaging and task events data and metadata. (3) Semi-automated inference and guidance for adherence to BIDS. (4) Multiple data management options: download BIDS data to local system, or transfer to OpenNeuro.org or to brainlife.io. In sum, ezBIDS requires neither coding proficiency nor knowledge of BIDS, and is the first BIDS tool to offer guided standardization, support for task events conversion, and interoperability with OpenNeuro.org and brainlife.io.


Assuntos
Metadados , Neuroimagem , Apresentação de Dados , Análise de Dados
11.
Astrobiology ; 24(2): 131-137, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38393827

RESUMO

As scientific investigations increasingly adopt Open Science practices, reuse of data becomes paramount. However, despite decades of progress in internet search tools, finding relevant astrobiology datasets for an envisioned investigation remains challenging due to the precise and atypical needs of the astrobiology researcher. In response, we have developed the Astrobiology Resource Metadata Standard (ARMS), a metadata standard designed to uniformly describe astrobiology "resources," that is, virtually any product of astrobiology research. Those resources include datasets, physical samples, software (modeling codes and scripts), publications, websites, images, videos, presentations, and so on. ARMS has been formulated to describe astrobiology resources generated by individual scientists or smaller scientific teams, rather than larger mission teams who may be required to use more complex archival metadata schemes. In the following, we discuss the participatory development process, give an overview of the metadata standard, describe its current use in practice, and close with a discussion of additional possible uses and extensions.


Assuntos
Exobiologia , Metadados , Software
12.
J Environ Manage ; 354: 120349, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38401497

RESUMO

Flow obstructed by bridge piers can increase sediment transport leading to local scour. This local scour poses a risk to the stability of bridge structures, which could lead to structural failures. There are two main approaches for evaluating the scour depth (ds) of bridge piers. The first is based on understanding hydraulic phenomena and developing relationships with properties affecting scour. The second uses data-driven soft computing models that lack physical interpretations but rely on algorithms to predict outcomes. Methods are chosen by researchers based on their goals and resources. This study aims to create innovative ensemble frameworks comprising support vector machine for regression (SVMR), random forest regression (RFR), and reduced error pruning tree (REPTree) as base learners, alongside bagging regression tree (BRT) and stochastic gradient boosting (SGB) as meta learners. These ensembles were developed to analyse maximum scour depths (dsm) in clear water conditions, utilizing 35 literature's experimental data published in last 63 years. The performance of each machine learning (ML) approach was assessed using statistical performance indicators. The proposed model was also compared with top six empirical equations with strong predictive ability. Results show that among these empirical equations, the equation from Nandi and Das (2023) performs best. Performance evaluation considering training, testing, and the entire dataset, SGB (REPTree), BRT(SVMR-PUK), and SGB (REPTree) exhibited the highest performance, securing the top rank among all ML models and empirical equations. Sensitivity analysis identified sediment gradation and flow intensity as the most influential variables for predicting dsm during both training and testing phases, respectively.


Assuntos
Metadados , Água , Algoritmos , Aprendizado de Máquina
13.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38217405

RESUMO

BACKGROUND: Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS: We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS: Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.


Assuntos
Multiômica , Software , Humanos , Criança , Fluxo de Trabalho , Reprodutibilidade dos Testes , Metadados
14.
Stud Health Technol Inform ; 310: 18-22, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269757

RESUMO

Adhering to FAIR principles (findability, accessibility, interoperability, reusability) ensures sustainability and reliable exchange of data and metadata. Research communities need common infrastructures and information models to collect, store, manage and work with data and metadata. The German initiative NFDI4Health created a metadata schema and an infrastructure integrating existing platforms based on different information models and standards. To ensure system compatibility and enhance data integration possibilities, we mapped the Investigation-Study-Assay (ISA) model to Fast Healthcare Interoperability Resources (FHIR). We present the mapping in FHIR logical models, a resulting FHIR resources' network and challenges that we encountered. Challenges mainly related to ISA's genericness, and to different structures and datatypes used in ISA and FHIR. Mapping ISA to FHIR is feasible but requires further analyses of example data and adaptations to better specify target FHIR elements, and enable possible automatized conversions from ISA to FHIR.


Assuntos
Medicamentos Genéricos , Instalações de Saúde , Humanos , Metadados , Atenção à Saúde
15.
Stud Health Technol Inform ; 310: 68-73, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269767

RESUMO

Electronic health records (EHRs) and other real-world data (RWD) are critical to accelerating and scaling care improvement and transformation. To efficiently leverage it for secondary uses, EHR/RWD should be optimally managed and mapped to industry standard concepts (ISCs). Inherent challenges in concept encoding usually result in inefficient and costly workflows and resultant metadata representation structures outside the EHR. Using three related projects to map data to ISCs, we describe the development of standard, repeatable processes for precisely and unambiguously representing EHR data using appropriate ISCs within the EHR platform lifecycle and mappings specific to SNOMED-CT for Demographics, Specialty and Services. Mappings in these 3 areas resulted in ISC mappings of 779 data elements requiring 90 new concept requests to SNOMED-CT and 738 new ISCs mapped into the workflow within an accessible, enterprise-wide EHR resource with supporting processes.


Assuntos
Sistema de Aprendizagem em Saúde , Medicina , Registros Eletrônicos de Saúde , Indústrias , Metadados
16.
Stud Health Technol Inform ; 310: 154-158, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269784

RESUMO

Decision-making in healthcare is heavily reliant on data that is findable, accessible, interoperable and reusable (FAIR). Evolving advancements in genomics also heavily rely on FAIR data to steer reliable research for the future. For practical purposes, ensuring FAIRness of a clinical data set can be challenging but could be aided by using FAIR validators. The study describes the test of two open-access web-tools in their demo versions to determine the FAIR levels of three submitted genomic data files with different formats (JSON, TXT, CSV). The F-UJI tool and FAIR-Checker tools provided similar FAIR scores for the three submitted files. However, the F-UJI tool assigned a total rating whereas the FAIR-Checker gave scores clustered by FAIR principles. Neither tool was suited to determine FAIR levels of a FHIR® JSON metadata file. Despite their early developmental status, FAIR validator tools have great potential to assist clinicians in the FAIRification of their research data.


Assuntos
Genômica , Instalações de Saúde , Metadados , Registros
17.
Stud Health Technol Inform ; 310: 599-603, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269879

RESUMO

We here report on one of the outcomes of a large-scale German research program, the Medical Informatics Initiative (MII), aiming at the development of a solid data and software infrastructure for German-language clinical natural language processing. Within this framework, we have developed 3000PA, a national clinical reference corpus composed of patient records from three clinical university sites and annotated with a multitude of semantic annotation layers (including medical named entities, semantic and temporal relations between entities, as well as certainty and negation information related to entities and relations). This non-sharable corpus has been complemented by three sharable ones (JSYNCC, GGPONC, and GRASCCO). Overall, 3000PA, JSYNCC and GRASCCO feature about 2.1 million metadata points.


Assuntos
Idioma , Informática Médica , Humanos , Semântica , Metadados , Processamento de Linguagem Natural
18.
Database (Oxford) ; 20242024 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-38204360

RESUMO

There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL:  https://microbiome.h3abionet.org/.


Assuntos
Metadados , Microbiota , Humanos , Metagenoma , Bases de Dados Factuais , Metagenômica , Microbiota/genética
19.
Sci Data ; 11(1): 112, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38263211

RESUMO

Here we provide a curated, large scale, label free mass spectrometry-based proteomics data set derived from HeLa cell lines for general purpose machine learning and analysis. Data access and filtering is a tedious task, which takes up considerable amounts of time for researchers. Therefore we provide machine based metadata for easy selection and overview along the 7,444 raw files and MaxQuant search output. For convenience, we provide three filtered and aggregated development datasets on the protein groups, peptides and precursors level. Next to providing easy to access training data, we provide a SDRF file annotating each raw file with instrument settings allowing automated reprocessing. We encourage others to enlarge this data set by instrument runs of further HeLa samples from different machine types by providing our workflows and analysis scripts.


Assuntos
Células HeLa , Aprendizado de Máquina , Proteômica , Humanos , Espectrometria de Massas , Metadados
20.
Sci Data ; 11(1): 143, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38291027

RESUMO

Data on the movement and space use of aquatic animals are crucial to understand complex interactions among biotic and abiotic components of ecosystems and facilitate effective conservation and management. Acoustic telemetry (AT) is a leading method for studying the movement ecology of aquatic animals worldwide, yet the ability to efficiently access study information from AT research is currently lacking, limiting advancements in its application. Here, we describe TrackdAT, an open-source metadata dataset where AT research parameters are catalogued to provide scientists, managers, and other stakeholders with the ability to efficiently identify and evaluate existing peer-reviewed research. Extracted metadata encompasses key information about biological and technical aspects of research, providing a comprehensive summary of existing AT research. TrackdAT currently hosts information from 2,412 journal articles published from 1969 to 2022 spanning 614 species and 380,289 tagged animals. TrackdAT has the potential to enable regional and global mobilization of knowledge, increased opportunities for collaboration, greater stakeholder engagement, and optimization of future ecological research.


Assuntos
Ecossistema , Metadados , Telemetria , Animais , Acústica , Movimento , Telemetria/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...